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FOREWORD 


This  report  describes  the  evaluation  of  a  Parallel  Block  Implicit  (PBI)  integration  tech¬ 
nique  in  a  simplified  missile  trajectory.  This  project  was  carried  out  to  ascertain  the  suit¬ 
ability  of  PBI  techniques  when  modest  amounts  of  parallelism  are  available;  that  is,  when 
3  to  10  processors  are  allocated  per  missile  trajectory.  This  work  was  performed  in  the  SLBM 
Research  and  Analysis  Division  as  a  Systems  Engineering  Enhancement  (SEE)  project  in 
fiscal  year  1992. 
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ABSTRACT 


This  report  describes  the  evaluation  of  a  Parallel  Block  Implicit  (PBI)  integration  tech¬ 
nique  in  a  simplified  missile  trajectory.  This  project  was  carried  out  to  ascertain  the  suit¬ 
ability  of  PBI  techniques  when  modest  amounts  of  parallelism  are  available;  that  is,  when 
3  to  10  processors  are  allocated  per  missile  trajectory.  The  PBI  technique  was  first  evalu¬ 
ated  on  a  serial  mainframe  computer  before  it  was  implemented  in  parallel  on  an  INMOS 
TRANSPUTER  with  four  parallel  central  processing  units.  While  the  serial  implementation 
of  the  four-node  PBI  technique  indicated  that  a  speedup  of  a  factor  of  three  to  four  was 
possible  with  ideal  hardware,  in  practice  only  a  modest  gain  (approximately  30  percent)  was 
obtained  because  of  systems- related  overhead. 
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INTRODUCTION 


From  a  hardware  perspective  parallel  processing  provides  a  natural  way  to  greatly  expand 
limited  computational  resources.  This  has  given  rise  to  a  strong  trend  toward  hardware 
parallelism;  however,  software  is  frequently  unable  to  make  efficient  use  of  this  parallelism. 
There  are  numerous  reasons  for  this.  First,  as  of  early  1993,  only  rudimentary  compilers 
exist  for  converting  standard  software  into  a  suitable  parallelized  assembly  language  there 
are,  however,  development  efforts  under  way  in  this  area  for  the  Cray^  as  well  as  other 
environments.  Second,  standards  are  somewhat  lacking  for  parallelism  in  both  the  hardware 
and  compiler  arenas.  This  further  compounds  matters  by  tending  to  make  parallel  coding 
efforts  nontransportable.  On  this  standardization  front  there  are,  however,  several  bright 
prospects  with  Fortran  90  being  a  notable  example.^  Third,  for  many  hardware  architectures 
parallel  implementations  have  steep  overhead  requirements;  consequently,  if  software  is  not 
well  suited  to  the  associated  operating  system  and  hardware  environment  then  some  parallel 
implementations  can,  in  fact,  consume  more  execution  time  than  consumed  by  purely  serial 
implementations.  Finally,  physical  problems  and  engineering  systems  are  quite  frequently 
not  amenable  to  parallelism  and,  if  this  is  the  case,  ‘smart’  compilers  and  proper  hardware 
can  have  only  limited  benefits. 

There  are  two  standard  software  strategies  for  implementing  parallelism.  In  the  first 
strategy,  one  attempts  to  have  large  blocks  of  high-level  code  that  run  independently.  This 
approach  is  labeled  ‘high-level  parallelism.’  In  the  second  strategy,  parallelism  is  worked 
directly  into  the  problem  definition  at  some  basic  level.  This  second  approach  is  known  as 
‘low-level  parallelism.’  The  potential  applicability  of  both  these  approaches  will  depend  on 
the  combination  of  the  hardware  at  hand,  the  problem  under  consideration,  and  the  soft¬ 
ware  engineering  itself.  If  high-level  parallelism  is  a  viable  strategy,  it  can  frequently  be 
implemented  in  a  direct  and  straightforward  way.  For  example,  if  one  wishes  to  implement 
missile  trajectory  simulations  (which  is  the  main  problem  of  interest  here)  and  there  are  four 
completely  independent  processors  available,  one  could  directly  implement  one  missile  tra¬ 
jectory  on  each  of  the  processors  and  obtain  an  approximate  speedup  of  a  factor  of  four  with 
a  minimal  amount  of  effort — provided,  of  course,  that  neither  data  nor  memory  addressing 
problems  arise.  Given  that  such  straightforward  steps  have  been  taken  where  appropriate, 
the  following  question  arises;  Can  a  better  scheme  be  found  or  can  additional  parallelism  be 
made  use  of?  Low-level  parallelism  arises  immediately  in  this  context.  For  trajectory  soft¬ 
ware  the  problem  of  how  to  attempt  low-level  parallelism  is,  however,  enigmatic.  This  is  not 
too  surprising  since  trajectories  themselves  are  serial  in  nature  the  current  position  and 
‘velocity’  depend  directly  on  the  ‘position’  and  ‘velocity’  at  any  given  past  instant.  Stated 
differently:  All  dynamical  variables  have  a  ‘Christmas  tree  light’  or  serial-like  dependence 
on  their  past  values.  Any  successful  low-level  parallel  trajectory  integration  scheme  must  be 
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robust  enough  to  overcome  the  inherent  penalty  imposed  by  its  nonserial  nature.  This  report, 
then,  analyzes  what  can  be  accomplished  by  applying  low-level  parallelism  to  trajectory 

modeling. 

Just  as  there  are  two  basic  software  strategies  for  implementing  parallelism,  there  are  two 
basic  strategies  on  the  hardware  front.  Parallel  hardware  can  thus  be  characterized  by  two 
extremes:  ‘small-scale’  parallelism  where  four  or  so  processors  are  involved  and  ‘large-scale’ 
or  massive  parallelism  where  1000  or  more  processors  are  frequently  used.  It  is  perhaps  worth 
noting  that  while  massive  parallelism  is  rapidly  gaining  favor  as  a  preferred  way  to  engineer 
up  scale  performance,^’"*’^  the  actual  realized  performance  is  frequently  rather  disappointing. 

The  particular  evaluation  effort  at  hand  is  restricted  to  small-scale  parallelism.  The 
reasons  for  this  restriction  are  twofold:  first,  available  hardware  for  testing  was  very  re¬ 
stricted  and,  in  fact,  a  TRANSPUTER  board  with  four  individual  processing  units  mounted 
on  a  286  personal  computer  was  used*.  Second,  it  is  clear  that  the  problem  under  study 
(individual  missile  trajectories)  is  manifestly  unsuited  for  massive  parallelism.  One  relatively 
straightforward  way  to  make  use  of  low-level  parallelism  is  to  simply  use  the  vector  nature  of 
trajectory  variables  by  assigning  one  computation  per  register;  however,  this  approach  has 
severe  limitations  and  tends  to  over  utilize  overhead  resources.  The  most  promising  approach 
to  low-level  trajectory  parallelism  that  emerged  in  this  study  was  the  Parallel  Block  Implicit 
(PBI)  technique.  This  technique  was  first  implemented  on  a  serial  computer  (CDC  875) 
for  extensive  test  and  evaluation.  Since  it  performed  as  hoped  by  displaying  considerable 
promise  in  the  standard  mainframe  environment,  it  was  then  implemented  on  a  286-based 
INMOS  TRANSPUTER  with  four  processing  elements— as  mentioned  earlier.  Mr.  Alvin 
Good  of  the  SLBM  Software  Development  Division  (K50)  performed  the  TRANSPUTER 
implementation  based  on  the  supplied  formulation  this  formulation  is  included  as  Appen¬ 
dix  A.  He  subsequently  performed  a  thorough  timing  evaluation.  Overhead  proved  to  be  a 
problem  for  the  PBI  technique  just  as  it  has  for  all  other  attempts  at  low-level  parallelism  (in¬ 
cluding  in-house  attempts  based  on  COGENT  machines).  There  are  strong  indications  that 
if  hardware  without  undue  overhead  constraints  becomes  available  then  the  PBI  technique 
will  perform  as  desired. 

This  report  contains  a  brief  overview  of  parallel  trajectory  integration  strategies,  the 
merits  of  the  PBI  approach,  a  summary  of  procedures  carried  out  in  the  present  study,  a 
synopsis  of  the  test  results  obtained  on  the  TRANSPUTER  by  Mr.  Alvin  Good  of  K50, 
and  a  summary  of  the  findings.  Appendix  A  contains  mathematical  implementation  details 
for  the  PBI,  as  well  as  the  Runge/Kutta  (R/K),  approach  while  Appendix  B  contains  a 
derivation  of  the  oblate  gravity  model  used.  It  is  worth  noting  that  the  PBI  approach  is  not 
mentioned  in  standard  text  books  and  all  the  relevant  PBI  journal  articles  seem  to  predate 
the  current  trend  toward  parallelism.®’'^’®  Thus  part  of  the  intent  of  this  report  is  to  draw 
wider  attention  to  a  noteworthy  but  somewhat  underutilized  integration  technique. 


•  When  the  TRANSPUTER  boards  are  mounted  on  a  386  similar  performance  results,  but 
there  are  many  alternative  transputer  architectures.® 
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TRAJECTORY  NUMERICAL  INTEGRATION  SCHEMES 


This  section  gives  an  overview  of  various  relevant  trajectory  integration  strategies  and 
then  discusses  the  background  studies  leading  up  to  the  choice  of  the  PBI  technique  as  the 
leading  candidate  for  low-level  parallelism.  For  missile  trajectories  of  interest,  a  complicated 
engineering  system  is  being  simulated  and  thus  large  numbers  of  so  called  critical  events  can 
be  expected.  Critical  events  tend  to  strongly  reduce  the  efficiency  of  standard  ‘predictor- 
corrector’  techniques^  and  therefore  R/K  techniques  naturally  come  under  consideration. 
R/K  techniques  are  ‘self-starting’  and  thus  can  be  easily  adapted  to  problems  where  a  large 
number  of  ‘start-ups’  arise  from  critical  events.  Unfortunately,  R/K  techniques  are  strongly 
serial  in  nature  so  they  are  not  easily  adapted  to  parallelism;  moreover,  the  eflficiency  of  R/K 
techniques  in  trajectory  applications  stems  largely  from  this  serial  nature.  Within  certain 
specialized  niches,  existing  R/K  techniques  do,  however,  have  parallelization  promise.  For 
example,  iterated  implicit  R/K  techniques'®-^ ^  are,  in  general,  worth  examining  in  connec¬ 
tion  with  six  degree  of  freedom  (6-D)  trajectory  applications.  Implicit  R/K  techniques  are 
useful  in  situations  where  integration  stability  must  be  controlled;  i.e.,  for  a  class  of  systems 
governed  by  so  called  ‘stiff  differential  equations,’^^  includes  6-D  trajectory  models. 

Given  the  intrinsic  difficulty  of  successfully  parallelizing  R/K  approaches,  alternative 
integration  schemes  came  under  consideration.  Many  new  ideas  surfaced  as  potentially  in¬ 
teresting.  Most  of  the  approaches  attempted  to  overcome  the  serial  dependence  associated 
with  numerical  integration  midpoints.  (For  an  example  of  this  serial  dependence  see  the 
fourth-order  R/K  procedure  shown  in  Appendix  A,  Section  II.)  One  alternative  idea  for  an 
integration  step  procedure  was  to  use  polynomial  fitting  techniques.  In  this  approach,  one 
first  uses  a  polynomial  fit  or  other  suitable  basis  set  over  a  small  interval  in  order  to  predict 
values  for  midpoint  step  locations.  The  resulting  evaluations  are  then  used  in  a  function¬ 
alization  integration  step  update  procedure  so  as  to  complete  the  current  integration  step. 
Another  alternative  approach  was  based  on  the  fact  that  geophysical  collocation  can  be  used 
to  develop  optimal  integrators  for  geophysical  quantities  and  these  collocation  integrators 
are  easily  parallelized.  In  their  realm  of  applicability,  ‘collocation  integrators  are  frequently 
more  efficient  than  standard  approaches  because  the  covariance  information  used  by  ‘col¬ 
location  integrators’  is  more  complete  than  the  information  used  by  standard  quadrature 
integrators.  Given  this  inherent  efficiency,  the  hope  was  that  such  approaches  could  be 
adapted  to  replace  standard  trajectory  numerical  integrators.  In  particular,  the  idea  was  to 
use  (geophysical)  collocation-like  techniques  with  covariance  functions  developed  specifically 
for  the  integration  task  at  hand  as  well  as  specific  missile  types  under  consideration  so  as 
to  decouple  midpoint  step  evaluations  in  the  integration  schemes.  During  the  time  these 
and  a  number  of  other  ideas  (such  as  Encke-based  approaches  and  variation  of  parameters'^) 
were  under  preliminary  consideration,  a  thorough  literature  review  turned  up  the  PBI  tech¬ 
nique  and  none  of  these  alternatives  were  followed  up.  The  PBI  technique  has  several  of 
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the  best  features  of  the  alternative  approaches  already  direcUy  built  into  it.  Moreover,  PBI 
techniques  have  the  great  advantage  of  having  a  proven  efficiency  that  is  comparable  to  ex¬ 
isting  R/K  techniques  for  serial  implementations— something  that  could  hardly  be  expected 
of  any  of  the  novel  techniques  listed  previously  (with  the  possible  exception  of  geophysical 
collocation-like  approaches).  Perhaps  the  most  significant  aspect  of  the  PBI  technique  is 
that  it  can  be  parallelized  naturally.  Further  advantages  of  the  PBI  approach  are  discussed 

in  the  next  section. 


PARALLEL  BLOCK  IMPLICIT  APPROACH 


PBI  techniques  are  discussed  at  length  in  the  original  literature  ’  and  a  complete  imple¬ 
mentation  description  is  given  in  Appendix  A,  Section  III  (which  is  based  on  a  synopsis  found 
in  Reference  11).  Further,  since  the  main  body  of  this  report  is  introductory  in  nature,  only 
a  descriptive  treatment  is  given  in  this  section.  PBI  techniques  contain  a  certa,in  number 
of  so-called  nodes  which  are  analogous  to  the  midpoint  evaluations  of  ordinary  integration 
schemes  except  that  they  are  completely  independent  and  so  can  be  evaluated  in  parallel. 
By  subjecting  these  node  points  to  a  form  of  iteration,  or  iterative  improvement,  solutions  of 
high  order  result.  (The  order  of  an  integration  scheme  and  other  standard  numerical  analysis 
terms  are  defined  in  standard  textbook  references.^)  When  PBI  ffichniques  are  compared  with 
R/K  techniques  they  generally  involve  more  computations  per  integration  step,  but  because 
of  their  greatly  increased  order  of  integration  one  can  take  much  larger  integration  steps. 
In  general  for  serial  applications,  because  of  these  two  offsetting  factors,  PBI  techniques  are 
roughly  as  efficient  as  R/K  techniques.  Any  gain  in  PBI  efficiency  due  to  parallelism  will 
thus  result  in  direct  improvements  in  comparison  with  R/K  techniques.  A  mathematical 
description  of  four-node  PBI  techniques  can  be  found  in  Section  III  of  Appendix  A.  Paral¬ 
lelism  arises  in  the  PBI  approach  from  the  fact  that,  for  each  iteration  of  equations  (A-13), 
(A-14),  (A-15)  and  (A-16),  the  main  trajectory  evaluations  are  independent;  that  is,  F(l,s), 
F(2,s),  F(3,5),  and  F(4,5)  can  be  evaluated  on  independent  processors  for  each  given  value 
of  s. 

PBI  techniques  were  judged  as  worth  exploring,  in  part  because  of  the  following  notewor¬ 
thy  features:  (1)  PBI  techniques  are  inherently  parallel.  For  example,  the  four-node  iterative 
PBI  system  implemented  uses  four  concurrent  function  evaluations  and  thus  is  naturally 
suited  to  the  use  of  four  simultaneous,  more  or  less  equally  loaded,  processors.  Furthermore, 
if  each  of  the  3  vector  components  of  position  and  velocity  are  assigned  dedicated  processors 
then  12  processors  can,  in  theory,  be  efficiently  utilized.  In  addition,  PBI  techniques  exist 
for  various  numbers  of  nodes.  (2)  PBI  techniques  are  of  high  order.  The  four- node  PBI 
technique  implemented  is  of  order  seven  while  most  standard  in-house  R/K  methods  are  of 
order  four.  (3)  All  the  final  node  points  are  also  accurate  trajectory  evaluation  points  so 
one  can  easily  use  them  as  interpolation  points  or  as  trajectory  output  points.  (4)  When 
implemented  in  serial,  PBI  techniques  are  roughly  as  efficient  as  R/K  techniques.  There 
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is  thus  only  a  nominal  penalty  to  overcome  in  using  a  PBI  technique  in  place  of  a  R/K 
technique.  (5)  PBI  techniques  are  compatible  with  Encke  techniques  which  are  frequently 
used  in  conjunction  with  R/K  techniques  to  enhance  their  efficiency.  (6)  The  ‘step  size’  can 
be  varied  at  will  from  one  integration  cycle  to  the  next.  (7)  PBI  techniques  are  relatively 

easy  to  implement. 

From  this  list  it  is  clear  that  PBI  techniques  have  a  number  of  desirable  traits.  Thus, 
for  example,  points  (3)  and  (6)  when  taken  in  conjunction  mean  that  one  can  easily  handle 
interrupts  for  critical  events  without  greatly  hampering  the  algorithm’s  efficiency  this  is 
an  important  point  since  many  otherwise  efficient  integration  schemes  are  of  rather  limited 
SLBM  utility  because  of  it.  PBI  techniques  may  be  tentatively  considered  the  ‘bench  mark 
standard’  by  which  other  attempts  at  low-level  trajectory  numerical  integration  parallelism 
should  be  judged.  The  relative  merits  of  the  PBI  approach  are  not,  however,  completely 
clear-cut.  For  example,  while  the  chosen  PBI  implementation  is  seventh-order,  there  are  only 
five  usable  interpolation  points  per  integration  cycle.  In  addition  to  limiting  the  accuracy  of 
critical  event  predictions  (i.e.,  item  number  (3)),  this  could  make  the  algorithm  sornewhat  less 
accurate  when  time-dependent  forces  are  included.  Time-dependent  forces  occur  in  practice 
when  thrust  tables  are  introduced.  Thus,  if  the  PBI  technique  is  to  be  put  to  practical 
use  it  may  have  to  be  extended  to  handle  explicit  time  dependence  better.  A  preliminary 
examination  suggests  such  modifications  can  be  readily  implemented  when  and  if  required. 

The  two  primary  issues  in  evaluating  the  merits  of  the  PBI  approach  are  the  relative 
efficiency  and  accuracy  of  the  technique  and  the  suitability  of  the  technique  to  available 
hardware. 


TESTING  RESULTS 


To  properly  evaluate  numerical  algorithm  efficiency  and  accuracy  one  must  do  compar¬ 
isons  relative  to  other  schemes;  consequently,  the  first  step  in  this  study  was  to  compare 
the  PBI  techniques  to  a  standard  fourth-order  R/K  technique.  This  part  of  the  study  was 
performed  with  a  test  bed  trajectory  model  implemented  on  a  serial  mainframe  computer. 
After  this  mainframe  testing  phase  was  carried  out,  the  second  part  of  the  testing  phase 
involving  a  parallel  TRANSPUTER  implementation  was  performed.  The  results  of  both 
phases  of  testing  should  be  born  in  mind  when  forming  an  assessment  of  the  suitability  of 
pgj  technique  in  the  context  of  other  uses.  Available  hardware  was  somewhat  limited 
in  the  TRANSPUTER  testing  phase. 
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PROTOTYPE  TRAJECTORY  TESTING  STAGE 

First  a  ‘test  bed  trajectory  model’  was  developed.  This  test  bed  trajectory  model  was 
a  ‘synthetic’  trajectory  model  implemented  on  a  standard  serial  computer  and  served  as  a 
simplified  test  platform  for  parallel  numerical  integration  techniques.  This  model  was  devel¬ 
oped  with  a  simplified  vacuum  and  atmospheric  trajectory  reentry  phase  and  also  included 
the  ability  to  model  either  tesseral  or  oblate  gravity  for  all  phases  of  flight.  In  addition,  a 
simple  thrust  boost  model  was  included  but  was  not  used  in  this  study.  Both  a  standard 
fourth-order  R/K  and  four-node  PBI  integration  technique  were  then  implemented  in  this 
model.  Comparison  studies  of  these  two  integration  schemes  were  carried  out  for  both  the 
vacuum  and  reentry  phases  of  flight  with  both  the  oblate  and  tesseral  gravity  models.  The 
PBI  technique  was  found  to  be  capable  of  taking  integration  steps  that  are  between  three 
and  four  times  as  large  as  the  R/K  technique.  For  an  ordinary  serial  computer  this  means 
that  the  PBI  technique  is  roughly  as  efficient  as  the  R/K  technique  since,  for  example,  the 
four-node  PBI  technique  involves  four  times  as  much  computation  per  step.  However,  on  a 
parallel  computer  of  the  right  architecture  (i.e.,  a  four-element  system  with  low  overhead), 
the  four-node  PBI  technique  should  run  four  times  as  fast.  Similar  types  of  ratios  should  hold 
for  PBI  schemes  with  higher  numbers  of  nodes  if  operating  system  overhead  is  low;  i.e.,  a  six- 
node  PBI  scheme  should  be  roughly  six  times  as  fast  with  the  proper  architecture.  Testing 
on  serial  computers  thus  indicated  that  PBI  performance  was  as  indicated  in  the  technical 
literature.^' The  next  step  was  to  determine  if  the  speedup  of  a  factor  of  three  and  a 
half  could  actually  be  realized  for  available  parallel  computer  systems.  Toward  that  end, 
the  test  bed  model  was  simplified  even  further  and  a  corresponding  formulation  was  written 
for  Fortran  code  (see  Appendix  A).  These  products  were  then  passed  to  K50  personnel  who 
then  performed  the  implementation  and  timing  studies  on  the  TRANSPUTER. 


TRANSPUTER  TESTING  STAGE 

While  the  theoretical  limit  of  efficiency  of  the  PBI  technique  as  implemented  is  a  speedup 
of  a  factor  of  four,  there  is  a  hardware-imposed  theoretical  speedup  limit  of  a  factor  of  three 
due  to  limitations  of  the  four-element  TRANSPUTER  architecture  employed.  Moreover, 
with  a  four-element  TRANSPUTER  architecture  even  this  limit  of  three  is  very  hard  to 
approach  in  low-level  parallelism  experiments  because  of  hardware  overhead  considerations. 
In  fact,  any  part  of  the  trajectory  that  is  not  in  parallel  remains  in  full-time  residence  on  at 
least  one  element  and  ties  up  system  resources.  Since  the  PBI  integration  technique  is  tied 
to  trajectory  architecture  and  a  general-purpose  study  was  intended,  it  was  decided  early  on 
that  K50  would  implement  only  an  oblate  gravity  force  module  and  that  more  sophisticated 
force  model  aspects  (i.e.,  tesseral  gravity,  thrust  tables,  aerodynamic  forces,  etc.)  were  to  be 
‘emulated’  (at  least  for  timing  study  purposes)  by  looping  through  the  oblate  gravity  routine 
repeatedly.  More  precisely,  the  requisite  ‘timing  emulation  loops’  were  placed  internally  to 
Section  Il.d  (or  Ill.d)  of  Appendix  A,  but  outside  all  the  computations  in  that  section  so  as  to 
perform  the  same  computation  repeatedly.  For  such  comparison  studies,  typically  25  to  2000 
loops  might  be  expected  to  emulate  various  levels  of  complexity  found  in  standard  trajectory 
models. 
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Timing  studies  were  conducted  on  the  complete  trajectories  including  the  parts  that  were 
outside  the  integration  loop  and  so  not  in  parallel.  Because  of  overhead,  the  PBI  technique 
actually  took  more  time  than  a  serial  implementation  for  1  to  10  ‘timing  emulation  loops. 
This  result  was  expected.  At  35  loops  the  four-node  TRANSPUTER  implementation  took 
24  percent  less  time  than  the  serial  implementation.  This  24  percent  less  time  amounts  to 
a  speedup  of  a  factor  of  1.31,  which  was  an  encouraging  result.  However,  not  much  greater 
efficiency  was  obtained  as  the  number  of  loops  was  increased— at  2000  loops  there  was  a  29 
percent  savings.  This  corresponds  to  a  speedup  of  a  factor  of  1.42.  To  check  the  hypothesis 
that  TRANSPUTER  overhead  was  hampering  the  PBI  efficiency  timing  marks  were  set 
around  just  that  part  of  the  code  which  was  explicitly  parallelized  (i.e.,  the  oblate  gravity 
routine).  This  experiment  confirmed  this  hypothesis — for  example,  with  2000  loops  around 
the  oblate  gravity  computations  a  speedup  of  a  factor  of  2.85  was  obtained.  Here  the  factor  of 
2.85  is  close  to  the  theoretical  limit  of  a  factor  of  3  previously  mentioned.  Given  the  overhead 
requirements  of  available  in-house  hardware,  it  seems  that  only  modest  gains  can  be  expected 
from  any  form  of  low-level  parallelism  with  the  TRANSPUTER  architecture  used.  The  main 
question  arising  here  is  whether  a  new  TRANSPUTER  architecture  (i.e.,  one  employing 
more  processing  elements)  could  be  employed  so  as  to  greatly  lessen  overhead  problems 
unfortunately  such  hardware-specific  issues  were  outside  the  scope  of  the  present  study. 


SUMMARY  AND  CONCLUSIONS 


In  summary,  while  the  serial  mainframe  implementation  indicated  that  a  speedup  of  a 
factor  of  around  3.5  was  quite  possible  with  ideal  parallel  hardware,  only  a  modest  speedup  of 
about  1.3  was  obtained  with  the  hardware  at  hand  because  of  systems-related  overhead.  The 
problem  of  systems-related  overhead  is  by  no  means  limited  to  the  hardware  and  software 
tested  in  the  present  low-level  parallelism  study,  but  has  been  found  to  be  pervasive.  By  using 
redundant  computational  resources,  it  should  be  possible  to  circumvent  overhead  problems. 
The  speedup  of  2.85  obtained  in  the  special  experiment  mentioned  in  the  last  section  seems 
to  lend  strong  support  to  this  hypothesis.  Given  the  current  hardware  overhead  situation,  it 
seems  that  the  most  efficient  available  strategy  is  to  stick  with  high-level  parallelism  whenever 
possible.  The  PBI  approach  itself  is,  however,  worth  considering  as  a  substitute  for  certain 
R/K  applications. 
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I.  INTRODUCTION/GENERAL  PROGRAM  STRUCTURE 


This  formulation  is  intended  for  implementation  on  a  parallel  computer  and  is  designed 
to  test  out  the  efficiencies  of  a  candidate  integration  procedure  for  trajectory  parallelism  (the 
Parallel  Block  Implicit  (PBI)  integration  algorithm)  against  a  commonly  used  serial  proce¬ 
dure  (fourth-order  Runge/Kutta  (R/K)).  Since  the  relative  efficiencies  of  the  two  schemes 
are  to  be  compared,  both  must  be  independently  implemented  and  timing  runs  must  then 
be  conducted  on  both  implementations.  This  formulation  thus,  in  reality,  consists  of  two 
independent  trajectory  formulations.  The  (serial)  R/K  consist  of  Sections  Il.a  through  Il.d 
of  the  formulation,  while  the  PBI  formulation  consists  of  Sections  Ill.a  through  Ill.d.  (Here 
Section  Ill.d  is  identical  to  Section  Il.d  and  as  such  is  not  reproduced.)  Sections  Il.a  and  Ill.a 
share  ‘common  code,’  which  can  be  reused  as  noted  explicitly  in  Section  Ill.a.  Both  major 
sections’  of  the  formulation  (Section  II  and  Section  III)  are  sufficiently  straightforward  as  to 
warrant  only  several  minor  additional  comments. 

With  regards  to  the  ‘R/K  formulation’  (Section  II)  the  ‘Initial  Condition  (I.C.)  Module’ 
(Section  Il.a)  and  the  ‘R/K  Integration  Module’  (Section  Il.b)  are  the  two  major  modules 
with  the  I  C.  Module  being  called  first.  Also  the  ‘Three  Force  Module’  (Section  II.c)  and 
the  ‘Oblate  Gravity  Module’  (Section  Il.d)  are  called  from  the  ‘R/K  Integration  Module’ 
(Section  Il.b).  These  last  two  modules  are  formulated  as  ‘function  calls’  for  convenience 
(i.e..  Section  II.c  is  formulated  as  Fr;.ree(-  *  0  while  Section  Il.d  is  formulated  as  Gofc,t(-  *  0  )• 
The  R/K  used  is  a  standard  text  book  fourth-order  one— a  mathematical  description  of  the 
equations  used  can  be  found  in  the  main  introduction  to  the  ‘R/K  formulation’  (Section  II). 

With  regards  to  the  ‘PBI  formulation’  (Section  III)  the  ‘I.C.  Module’  (Section  Ill.a) 
and  the  ‘PBI  Integration  Module’  (Section  Ill.b)  are  the  two  major  modules  with  the  I.C. 
Module  being  called  first.  The  ‘Six  Force  Module’  (Section  III.c)  and  the  ‘Oblate  Gravity 
Module’  (Section  Ill.d  or  Section  Il.d)  are  called  from  the  ‘PBI  Integration  Module’  (Section 
Ill.b).  These  last  two  modules  are  formulated  as  ‘function  calls’  for  convenience  [i.e.  Section 
III.c  is  formulated  as  F,.-:,(-  •  •)  while,  as  noted  above.  Section  Ill.d  or  Il.d  is  formulated  as 
Gobit{-  •  •)  ]•  A  mathematical  description  of  the  equations  used  for  the  PBI  technique  can  be 
found  in  the  main  introduction  to  the  ‘PBI  formulation  (Section  III). 
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II.  RUNGE/KUTTA  (R/K)  IMPLEMENTATION 

The  second  order  differential  equation  to  be  integrated  is 

x  =  f(<,x,v),  (AT) 

where  three- vectors  are  shown  in  boldface  type  (i.e.,  x  and  v  are  three- vectors  while  f  is  a 
vector  valued  function).  Equation  (A-1)  must  be  integrated  in  a  step-by-step  fashion.  It  is 
standard  practice  to  introduce  a  ‘running  index,’  say  i  to  indicate  the  present  step  of  the 
integration  process.  (It  is  not  necessary  to  dimension  variables  for  this  index— all  that  is 
required  is  to  simply  make  sure  that  the  ‘new’  or  ‘updated  values  [i.e.,  the  (f  -f- 1)  th  values] 
are  distinguished  from  the  ‘current  values’  of  the  appropriate  registers  [i.e.,  the  z’th  values].) 


The  standard  fourth-order  R/K  update  equations  are 

Xi+i  =  X,-  -f  h\i  -1-  |(mo  -I-  mi  -|-  m2)  (A-2) 

v,+i  =  V,-  -h  |(mo  +  2mi  -I-  2m2  T  m3),  (A-3) 

mo  = /if(<i,Xj,Vi)  (A-4) 

mi  =  hi{ti  +  X,-  -1-  |v.-,  V,-  -I-  |mo)  ( A-5) 

m2  =  M{U  -f  |,Xi  +  fvi  -H  |mo,Vi  -[-  |mi)  (A-6) 

m3  =  /if  (t.+i ,  x;  +  hvi  +  |mi ,  V.-  +  m2)  (A-7) 


where 


NSWCDD/TR-94/217 


a.  R/K  INITIALIZATION  MODULE 


This  module  (or  ‘SUBROUTINE’)  sets  up  initial  conditions  or  ‘input’  values  for  the 
R/K  integration  software.  In  all  the  sections  that  follow,  the  dimensions  of  variables  or 
constants  are  denoted  by  appending  subscripts  and  square  brackets.  The  subscripts  denote 
the  size  of  dimension  required  explicitly.  (Thus  [xa]  means  that  x  is  a  ‘three-vector  or  array 
of  dimension  three.)  Undimensioned  variables  are  shown  in  the  input-output  list  without 
brackets.  All  variables  in  the  input-output  lists  are  of  type  REAL  unless  otherwise  noted. 

PURPOSE  OF  MODULE:  This  module  sets  up  initial  conditions  for  R/K  variables  and 
constants. 


INPUT:  None 

OUTPUT  (CONSTANTS):  [X03]  ,  [V03]  , 00,012, To, 


The  following  values  (in  engineering  units)  are  set  up  when  this  module  is  called  (other  units 
or  more  exact  values  can  be  substituted  if  required  [see  Appendix  Bj): 

ao  =  -.1407643  x  10^^ 

02  =  -.71110  X  10^^ 

Reasonable  test  values  must  also  be  supplied  for  the  following  variables  (it  is  assumed  that 
the  origin  of  the  coordinate  system  is  at  the  earth’s  center  and  that  the  z-axis  is  along  the 
polar  direction): 


f  appropriate  | 

(  appropriate 

Xo=  < 

input  values  >  , 

Vo  =  <  input  values 

[  (in  feet)  J 

[  (feet  per  second)  ^ 

To  =  input  (in  seconds) 

Tend  =  input  (in  seconds) 

=  1  or  input  (in  seconds). 


{  End  of  Module) 
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b.  R/K  TRAJECTORY  INTEGRATION  MODULE 


This  module  (or  ‘SUBROUTINE’)  is  the  main  R/K  integration  module.  (See  the  com¬ 
ments  given  in  Section  II. a  for  variable  and  constant  sizing  conventions.) 

PURPOSE  OF  MODULE:  This  module  integrates  position  and  velocity  vectors  by  applying 
equations  (A-2)  through  (A-8).  The  governing  differential  equation  is  equation  (A-1). 

INPUT  (CONSTANTS):  [X03]  ,  [V03I  ,To,Tend,  Hgip 
OUTPUT  (PRINT  OUT  ONLY):  TjXalJVa] 

VARIABLES  (‘LOCAL’):  T,  [X3],  [V3],  [F3],  [1x103] ,  [11113]  ,  [11123]  ,  [m33] 


First  initialize  time,  position  and  velocity  variables: 

T  =  To 
X  =  Xo 
V  =  Vo. 

Next  set  up  ‘main  do  loop’: 

Begin  Loop:  Do  until  T  >  Tend 

,  _  ttR/K 

"  —  ^Step 

ti+i  =  T  h. 

Next  calculate  the  ‘three-force’  (i.e.,  ‘CALL’  Section  II. c): 

F  =  Frfcree(T,X,  V) . 

Evaluate  the  right-hand  side  of  equations  (A-4),  (A-5)  and  (A-6): 


nio  =  ^F 


F  =  FrhreeiT  -b  X  -b  |V,  V  -b  |mo) 
mi  =  hF 

F  =  FThree{T  -b  |,X-b  |V  -b  |mo,V-b  |mi) 


m2  =  hF 
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F  =  Fr;.ree(i.+i,X  +  hV  +  |mi,  V  +  m2) 

m3  =  ^F . 

Update  X:  Replace  X  by 

X  +  /iV  +  |(mo  +  mi  +  m2) . 

(Operationally  this  is  a  two-step  process  [  Xneiu  =  X  +  hV  +  g(mo  +  mi  +  m2)  and 
X  =  X„e«;]  that  can  be  performed  in  a  single  step  in  most  programming  languages.) 

Similarly  update  V :  Replace  V  by 

V  +  |(mo  +  2mi  +  2m2  +  m3) . 

Update  T:  Replace  T  hy  T  +  h  (i.e.,  T  =  T  +  h). 

Next  print  the  output  variables: 

PRINT  OUT:  T,  X  and  V. 

Finally  branch  back  to  the  ‘main  do  loop  entry  point.’ 

End  test  on  T. 

{  End  of  Module} 
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c.  R/K  THREE-FORCE  MODULE  (FrAree) 


This  module  (or  ‘SUBROUTINE’)  calculates  three-force. 


PURPOSE  OF  MODULE:  In  the  governing  differential  equation  (equation  (A-1))  a  force 
evaluation  occurs  on  the  right-hand-side.  This  module  evaluates  that  force  as  required  by 
R/K  numerical  integration. 


INPUT:  r,[X3],[V3] 
OUTPUT:  FThree 
VARIABLES  (‘LOCAL’):  [G3] 


First  call  the  oblate  gravity  evaluator  (i.e.,  ‘CALL’  Section  II. d): 

G  =  Go6/t(X) . 

Next  set  up  the  value  of  FxAree  and  return: 

FrAree  =  G  . 


{  End  of  Module} 
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d.  R/K  OBLATE  GRAVITY  MODULE  (Gobit) 


This  module  (or  ‘SUBROUTINE’)  calculates  oblate  gravity  at  a  given  posiUon  (i.e.,  the 
specified  position  vector  R).  Appendix  B  provides  a  derivation  of  the  equations  implemented 

in  this  section. 

PURPOSE  OF  MODULE:  In  the  governing  differential  equation  (equation  (A-1))  a  force 
evaluation  occurs  on  the  right-hand-side.  This  module  evaluates  the  gravitational  part  of 
that  force  as  required  by  R/K  (or  PBI)  numerical  integration. 


INPUT:  [R3] 

OUTPUT:  Gobit 

INPUT  (CONSTANTS):  00,0:2 

VARIABLES  (‘LOCAL’):  [G^IR^Ga^Gb 


First  calculate  the  magnitude  of  the  vector  R: 


R  =  R\-{-  R^-V  R3 


where 


Next  calculate  intermediates: 


Ga  — 


R  = 


oo  •  02 


R5 


5Rl 


R3 


R2 


Calculate  the  gravity  components: 


Gi  =  Ri  •  Gb 
G2  =  R2-Gb 
G3  =  Rs'  {Gb  —  ^Ga)  , 


where 


Next  set  up  the  values  of  Gobit  return: 

Gobit  =  G . 


{End  of  Module) 
[End  of  R/K  Formulation] 
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III.  PARALLEL  BLOCK  IMPLICIT  (PBI)  IMPLEMENTATION 


As  noted  in  Section  II,  the  second  order  differential  equation  to  be  integrated  is 

X  =  f(t,x,v), 

where  three-vectors  are  shown  in  boldface  type  (i.e.,  x  and  v  are  three-vectors  while  f  is 
a  vector  valued  function).  Equation  (A-1)  should  be  contrasted  with  the  usual  equation 
treated  directly  by  most  numerical  integration  procedures; 

^  =  /(x,i).  (A-9) 

Equation  (A-9)  is  first  order  (instead  of  second  order)  and  it  treats  only  the  one-dimensional 
case.  Eor  standard  numerical  integration  schemes  it  is  a  simple  matter  to  treat  higher¬ 
dimensional  cases  (as  in  equation  (A-1)) — in  effect  all  that  is  necessary  is  to  replace  the 
scalar  quantities  in  equation  (A-9)  by  the  appropriate  vector-valued  ones.  The  other  defect 
can  also  be  overcome  easily.  A  second  order  differential  equation  can  be  recast  in  the  form 
of  a  first  order  differential  equation  by  simply  treating  x  and  x  as  independent  quantities 
[which  doubles  the  number  of  arguments  in  equation  (A-9)].  The  above  strategy  is  used  in 
the  PBI  implementation  given  in  this  section.  This  means  that  we  must  deal  with  a  ‘six- 
vector’  implementation  rather  than  the  ‘three-vector’  implementation  given  in  Section  II. 
The  following  paragraph  should  help  clarify  the  mathematical  details.  (In  Section  III  the 
‘running  index,’  which  indicates  the  present  step  of  the  process,  is  implicitly  understood 
and  as  such  does  not  appear  in  either  the  mathematical  equations  or  the  formulation  [except 
indirectly  during  the  update  to  the  ‘state  vector’].)  Whereas  three-vectors  have  been  denoted 
simply  by  boldface  type,  six- vectors  will  be  denoted  by  boldface  type  with  top  arrows. 

Let  the  ‘generalized  position  vector’  (i.e.,  the  ‘state  vector’)  and  ‘generalized  force’  (i.e., 
the  ‘six-force’)  be  defined  as  follows: 


'  X  ' 

'  Vx  ' 

y 

z 

Vx 

>  and  F  =  < 

Vx 

Fx 

% 

Fy 

.  Vx  . 

.  Fi  ; 

where  (s,  y,  x)  are  the  components  of  position,  (v^,  Vy,  u*)  are  the  components  of  velocity  and 
(Fj;,Fy,F2)  are  the  components  of  (three)  force.  Then  F  =  ma  =  mX  becomes 

—  =  F  =  F(t,Y).  (A-10) 

dt 
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PBI  equations  are  given  in  terms  of  Y (r,  s)  where  r  is  the  node  number  (r  -  1, 2, 3,  or  4  and 
is  a  sort  of  ‘running  index’  within  a  given  integration  cycle)  and  s  is  the  iteration  number 
(5  =  0  to  start,  then  s  =  1,2,3  and  [finally]  4).  ‘Position’  and  time  depend  on  the  iteration 

number:  Y  =  Y(r,3),  t  =  i(r).  Let  F(r,s)  =  F  (i(r),  Y(r,^))^  then  the  PBI  integration 
equations  are  given  below  (where  Y„  is  the  starting  value  ol  Y  for  the  given  integration 
cycle  j  I 

Y(r,0)  =  Yo  +  r-;i-F(r,0),  (A-11) 

where  r  =  1,2, 3,4  and  _  ^ 

F(r,0)  =  F(t(r),Yo). 

Next  iterate  the  following  set  of  equations  for  5  =  1  to  4: 

Y(1,5  +  1)  =  Yo  +  i^{251Fo  +  646F(1,5)  -  264F(2,5)  +  106F(3,s) 

-19F(4,5)}  (A-12) 

Y(2, 5  +  1)  =  Yo  +  A{29Fo  +  124F(1,  s)  +  24F(2,  s)  +  4F(3, 5)  -  F(4,  s)}  (A-13) 

Y(3,6  +  l)  =  Yo  +  i{9Fo  +  34F(l,5)  +  24F(2,s)+14F(3,5)  -F(4,5)}  (A-14) 

Y(4, 5  +  1)  =  Yo  +  i{7Fo  +  32F(1,  s)  +  12F(2,  s)  +  32F(3,  s)  +  7F(4, 5)}  (A-15) 

where 

FosF(0,0). 

This  completes  one  full  integration  step.  To  perform  the  next  integration  step  set 

Yo  =  Y(4,5)  (A-16) 

and  apply  the  same  procedure  again  (starting  with  equation  (A-11)). 

The  reader  may  have  observed  that  since  six-vectors  are  used  in  the  above  equations  one 
can  achieve  parallelism  by  simply  allocating  a  ‘computation  node’  to  each  of  the  six  inde¬ 
pendent  components.  This  is  indeed  one  strat^y  for  implementing  parallejism;  however,  the 
favored  approach  arises  from  observing  that  F(l,s),F(2,s),F(3,s)  and  F(4,s)  are  compu¬ 
tationally  independent  during  the  iteration  of  equations  (A-12),  (A-13),  (A-14)  and  (A-15) 
and,  as  such,  each  equation  can  be  assigned  to  a  specific  (and  independent)  ‘computational 

node.’ 
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a.  PBI  INITIALIZATION  MODULE 


This  module  (or  ‘SUBROUTINE’)  sets  up  initial  conditions  or ‘input’  values  for  the  PBI 
integration  software.  Much  of  the  beginning  of  this  section  is  identical  to  Section  Il.a.  (As  in 
Sections  Il.a  through  Il.d,  the  dimensions  of  variables  or  constaiits  are  denoted  by  appending 
subscripts  and  square  brackets  within  this  and  the  following  sections.  The  subscripts  indicate 
the  size  of  dimension  required  explicitly.  {Thus  [xa]  means  that  x  is  a  three- vector  or  array 
of  dimension  three.)  Undimensioned  variables  are  shown  in  the  input-output  list  without 
brackets.  All  variables  in  the  input-output  lists  are  of  type  REAL  unless  otherwise  noted.) 


PURPOSE  OF  MODULE:  This  module  sets  up  initial  conditions  for  PBI  variables  and 
constants. 


INPUT:  None 

OUTPUT  (CONSTANTS):  [X03]  ,  [V03]  ,ao,a2,ro,re„d, [B4],  [C’4x4] 


The  following  values  (in  engineering  units)  are  set  up  when  this  module  is  called  (other  units 
or  more  exact  values  can  be  substituted  if  required  [see  Appendix  B]): 

ao  =  -.1407643  x  10^^ 

02  = -.71110  X  10^^ 

Reasonable  test  values  must  also  be  supplied  for  the  following  variables  (it  is  assumed  that 
the  origin  of  the  coordinate  system  is  at  the  earth’s  center  and  that  the  i:-axis  is  along  the 
polar  direction): 

{appropriate  (  appropriate 

input  values  >  ,  Vq  =  <  input  values  > 

(in  feet)  J  [  (feet  per  second)  ^ 


To  =  input  (in  seconds) 

Tend  —  input  (in  seconds) 

Most  of  the  above  is  identical  to  Section  Il.a  (of  course  is  not  set  since  it  is  not  used 

in  this  module). 


Next  set  up  constants  indigenous  to  the  PBI  integration  technique  itself: 

ttPBI  _  1 
^Step  — 
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(Notice  that  the  PBI  ‘step  size’  here  is  basically  four  times  the  R/K  step  size 

the  node  number  used  is  four  here  [i.e.,  Section  II].) 


Bi 

_  251 

720 > 

B2  = 

.  29 
■  90’ 

a  =  3(^). 

B4  = 

Hh) 

Ci,i  = 

646 

720’ 

C'1,2  = 

- 

V720/  ’ 

_  106 

*-^1.3  ~  720’ 

Cl, 4 

=  -® 

C’2,1  = 

124 

90  ’ 

^2,2  = 

24 

90’ 

^2,3  = 

C2A 

=  -(^) 

C'a.i  = 

3(M). 

^3,2  = 

3(i). 

Cm  =  3  (g) , 

C3,4 

=  -3  (5) 

C'4,l  = 

2(i). 

<^4,2  = 

2(i). 

=  2  (H)  , 

(84,4 

=  Hi) 

Next  perform  a  ‘consistency  check:’ 

4 

4 

Sj  =  Ckj  ;  for  i  =  1, 2, 3, 4. 

k=l 

Print  the  results  and  exit  this  module: 

Print  So,  Si,  82,53,84. 


{  End  of  Module} 


since 
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b.  PBI  TRAJECTORY  INTEGRATION  MODULE 


This  module  (or  ‘SUBROUTINE’)  is  the  main  PBI  integration  module. 

PURPOSE  OF  MODULE:  This  module  integrates  position  and  velocity  vectors  by  applying 
equations  (A-2)  through  (A-7).  The  governing  differential  equation  is  equation  (A-1). 

INPUT  (CONSTANTS):  [X03I  ,  [V03]  ,ro,re„d,  [54],  [C4X4] 

OUTPUT  (PRINT  OUT  ONLY):  T, 

VARIABLES  (‘LOCAL’):  ,  [Yr^^,]  ,  [y„,]  ,  [F„e]  ,  [Fnex^J  ,[*54], [Yg], [Fe] 


First  initialize  time  and  the  ‘state  vector’  (position  and  velocity  variables): 

tp  =  To 

(  Xo(l)  1 
Xo(2) 

X„(3) 

“  1  V„(l) 

V„(2) 

I  Vo(3)  J 

where  Xo(fc)  =  [Xo]fc  for  A:  =  1,2,3  (and  similarly  for  Vo). 

Next  set  up  ‘main  do  loop’: 

Begin  Loop:  Do  until  T  >  Tend 

T  =  t, 


Next  calculate  the  ‘six-force’  (i.e.,  ‘CALL’  Section  III.c) 

F  =  F,.-,(r,Y). 


Set 

tR{k)  =  tp  +  k-h  for  fc  =  l,2,3,4. 
Set 

=  ¥(.)  +  *:•  ft  •?(.') 

for  A;  =  1,2, 3, 4  and  i  =  1,2, 3, 4. 
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Set  ^ 

Y„=Y,  F„  =  F. 

Next  perform  the  iterative  improvement  (built  into  the  FBI  technique): 

Begin  loop  /  =  1  to  4  (this  is  an  implicit  loop— i.e.,  I  is  not  used  explicitly): 
Begin  loop  K  =  1  to  4: 

Set 

y(/)  =  y(/,/0;/  =  i,2,---,6 
T  =  tn{K) 

F  =  F«.(r,  Y) 

End  loop  K. 

Calculate  Yji{I,K)  =  [Yr]i,k  • 

4 

Yr(I,  K)  =  Yn(/)  +  h-  Bk  •  F„(/)  +  CK,j^R{.h  J) 

j=i 

for  /=  1,2, 3, •••,6;  7'^  =  1,2, 3, 4. 

End  loop  /. 

Update  time  and  the  ‘state  vector’  (i.e.,  Y): 

tp  =  <fl(4) 

Y(7)=Y(7,4)  ,7  =  l,2,3,---,6. 

Next  print  the  output  variables: 

PRINT  OUT:  T  and  Yr. 

Finally  branch  back  to  the  ‘main  do  loop  entry  point: 

End  test  on  T. 

{  End  of  Module} 
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c.  FBI  SIX-FORCE  MODULE  (F,.-:,) 


This  module  (or  ‘SUBROUTINE’)  calculates  six-force. 


PURPOSE  OF  MODULE:  If  the  governing  set  of  second  order  differential  equations  (i.e., 
equation  (A-1)  in  vector  form)  is  recast  as  a  set  of  first  order  differential  equations  (i.e., 
equation  (A-9)  in  vector  form)  a  ‘six-dimensional’  force  (or  ‘six-force  )  evaluation  occurs  on 
the  right-hand  side.  This  module  evaluates  that  force  as  required  by  the  PBI  numerical 
integration  procedure. 


INPUT:  [X3] 


OUTPUT: 


31X6 


VARIABLES  (‘LOCAL’):  [Yg],  [G3],  [Z3] 


First  set  up  the  call  variables  for  the  oblate  gravity  evaluator 


rx,] 

1  fY.’l 

^  X2 

II 

1x3  J 

1  IY3J 

and  then  actually  ‘CALL’  Section  Ill.d  or  Il.d: 

G  =  Gotn(X,  G) . 

Next  set  up  the  value  of  Fgix  and  return: 


{  End  of  Module} 
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d.  PBI  OBLATE  GRAVITY  MODULE  (Gowt— same  as  Section  Il.d) 


This  module  (or  ‘SUBROUTINE’)  calculates  oblate  gravity.  This  section  is  identical  to 
Section  Il.d  and  is  not  reproduced  here  to  eliminate  redundancy— reproduce  the  coding  for 
Section  Il.d  and  insert  it  here. 


[End  of  PBI  Formulation] 
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OBLATE  GRAVITATIONAL  MODEL  DERIVATION 


This  appendix  derives  the  oblate  gravitational  model  and  constants  found  in  Appendix  A. 
Toward  that  end,  first  consider  a  spherical  harmonic  series  expansion  for  the  Earth’s  external 
gravitational  potential.®"^’ The  gravitational  acceleration,  G,  can  then  found  by  taking 
the  gradient  of  the  potential  (V): 

G  =  VV.  (B-1) 

Notice  that  ‘gravitational  acceleration’  does  not  include  centrifugal  force  terms.®"^  (One 
would  expect  centrifugal  terms  not  to  be  included  in  the  gravitational  model  here  since  in¬ 
ertial  coordinate  frames  are  implicitly  assumed  throughout  this  report.)  Physicists  do  not 
commonly  make  the  distinction  between  gravity  and  gravitation  that  geophysicists  make,®"^ 
since  it  is  always  obvious  from  the  physical  context  whether  centrifugal  forces  are  to  be 
included  or  not.  In  this  appendix  the  distinction  between  gravitation  and  gravity  is  main¬ 
tained;  however,  elsewhere  in  the  text  the  more  customary  phrase  ‘oblate  gravity  model’  is 
used  in  place  of  the  (geophysically)  correct  term  ‘oblate  gravitational  model.’  It  is  also  worth 
noting  that  the  sign  convention  on  the  right-hand  side  of  equation  (B-1)  is  the  one  commonly 
used  by  geophysicists®'^’ ®'^  and  it  differs  from  the  one  commonly  used  by  physicists.  The 
sign  convention  of  equation  (B-1)  is  adhered  to  throughout  the  report. 

For  an  oblate  gravitational  model,  it  is  only  necessary  to  retain  the  first  two  dominant 
terms  in  a  spherical  harmonic  expansion:®'^’ ®'^ 

V  =  l  +  ^A(cos^)  ,  (B-2) 

where  R  is  the  magnitude  of  the  position  vector  R  (as  in  Section  Il.d  of  Appendix  A);  ^ 
is  the  angle  between  the  polar  vector  {R3  or  z  direction)  and  the  position  vector  R;  P2 
is  the  second-order  Legendre  polynomial  of  the  first  kind;  a,  J2  and  kM  are  earth-related 
constants;  and  the  origin  of  the  coordinate  system  is  at  the  Earth’s  center.  The  following 
(approximate)  values  of  these  constants  will  be  used  in  the  sequel:®'^ 

kM  =  3.986  X  10®^ 
a  =  6378  km 
J2  =  .0010827  . 

B-1  Heiskanen,  Weikko  A.  and  Moritz,  Helmut,  Physical  Geodesy,  W.  H.  Freeman  and  Co.,  San  Francisco, 
CA,  1967. 

B-2  Kaplan,  Marshall  H.,  Modem  Spacecraft  Dynamics  &  Control,  John  Wiley  &  Sons,  Inc.,  New  York, 
NY,  1976. 
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In  what  follows  let  a:  =  i?i,  j/  =  R2,  z  =  R3  and  r  =  R.  Then 

P2(cos  (j))  =  |(3  cos  2<^  +  1)  =  |(2  cos^  —  1)  +  | 


and  since  cos  <f>  =  zjr  \i  follows  that 

P2(cos<^)  _  3z^  1 

2  2r^ 

Equation  (B-2)  can  thus  be  rewritten  as 

Moreover,  since  Vr  =  R/r  it  follows  that 


and  thus  that 


dx 


—kMx 


kMa'^Ji 


3  X 
2  r® 


15  z^x 
Y~r^ 


dy 


—kMy 


kMa^Ji 


2r5 


15z^y 


?Y. 

dz 


-kMz 


kM  J2 


r®  2  r®  2  r'^ 


(B-3) 

(B-4) 

(B-5) 


Equations  (B-3),  (B-4),  and  (B-5)  can  be  simplified  by  introducing  the  following  variables 

^  ao  •  02 

= - K — 


Gb  =  ^  +  o.(^-i) 


and  set  of  constants 

ao  =  —kM 
02  =  • 

Thus  equations  (B-3),  (B-4),  and  (B-5)  can  be  rewritten  as 

r  .  r 

=  (jri—  X-  Ltb 
OX 


(B-6) 

(B-7) 


?}L 

dy 

dV 

dz 


=  G2  —  y  •  Gb 
=  G3  =  z  •  {Gb  —  2Ga)  , 


B-4  ■ 


NSWCDD/TR-94/217 


in  agreement  with  Section  Il.d  of  Appendix  A,  where 

Gi' 

Gobit  =  *  G2  >  ■ 

Finally,  using  the  values  of  the  Earth-related  constants  given  previously  in  equations  (B-6) 
and  (B-7)  and  switching  to  engineering  units  (i.e.,  feet,  feet  per  second,  etc.)  yields: 

oo  =  -.1407643  X  10^^ 
aj  =  -.71110  X  10^^ 

in  agreement  with  the  values  found  in  Sections  Il.a  and  Ill.a  of  Appendix  A. 
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